---
layout: post
date: 2023-11-17
title: "Zig's std.json.Parsed(T)"
description: "A look at why Zig's std.json.Parsed(T) exists."
tags: [zig]
---

<p>When parsing JSON using one of Zig's <code>std.json.parseFrom*</code> functions, you'll have to deal with the return type: an <code>std.json.Parsed(T)</code>:</p>

{% highlight zig %}
const file = try std.fs.openFileAbsolute(file_path, .{});
defer file.close();

var buffered = std.io.bufferedReader(file.reader());
var reader = std.json.reader(allocator, buffered.reader());
defer reader.deinit();

const parsed = try std.json.parseFromTokenSource(Config, allocator, &reader, .{
	.allocate = .alloc_always,
});
defer parsed.deinit();
const config = parsed.value;
{% endhighlight %}

<p>In the above code <code>parseFromTokenSource</code> returns a <code>std.json.Parsed(Config)</code>. This is a simple structure which exposes a <code>deinit</code> function as well as the parsed <code>value</code>. The reason <code>parseFromTokenSource</code> cannot simply return <code>T</code> (<code>Config</code> in the above case), is that parsing JSON likely results in memory allocations. For example, if our imaginary <code>Config</code> struct had a <code>tags: [][]const u8</code> field, then <code>parseFromTokenSource</code> would need to allocate a slice.</p>

<p>Memory allocated while parsing can be difficult to manage, at least in a way that works in all cases. It would be an unreasonable burden to ask the caller to know what needs freeing, especially for complex/nested objects. To work around this, <code>parseFromTokenSource</code> creates an <code>std.heap.ArenaAllocator</code> from which all allocations are made. When <code>deinit()</code> is called on the returned <code>Parsed(T)</code>, the arena allocator is freed and destroyed.</p>

<p>You might have noticed the <code>.allocate = .alloc_always</code> passed as an option to <code>parseFromTokenSource</code>. This option only relates to whether or not strings are duplicated from the source. In the above example, because our source, <code>file</code>, outlives <code>parsed</code>, we could use the alternative option <code>.alloc_if_needed</code>. But that would not alter the fact that other allocations, such as slices for arrays, would still be created and managed by the arena.</p>

<p>While <code>Parsed(T)</code> is an effective way of dealing with allocations, it's still, in my opinion, cumbersome to use. Specifically, in our above example, <code>config</code> is tied to the lifecycle of <code>parsed</code>. If we wanted to write a <code>loadConfig</code> function, we wouldn't be able to return <code>Config</code>, we'd have to return <code>std.json.Parsed(Config)</code>. A program that deals with a lot of JSON might find itself passing <code>std.json.Parsed(T)</code> all over the place.</p>

<p>Unfortunately, there's no <em>great</em> solution to this problem. If our object is simple, we can clone all allocate fields thus allowing us to decouple the lifetime of our object from <code>std.json.Parsed(T)</code>, but that's not efficient or scalable. If you look at the implementation of <code>parseFromTokenSource</code>, you'll see that it creates the <code>ArenaAllocator</code> and calls a <code>Leaky</code> variant:</p>

{% highlight zig %}
fn parseFromTokenSource(comptime T: type, allocator: Allocator, ....) !Parsed(T)
	var parsed = Parsed(T){
		.arena = try allocator.create(ArenaAllocator),
		.value = undefined,
	};
	errdefer allocator.destroy(parsed.arena);
	parsed.arena.* = ArenaAllocator.init(allocator);
	errdefer parsed.arena.deinit();

	parsed.value = try parseFromTokenSourceLeaky(T, parsed.arena.allocator(), scanner_or_reader, options);

	return parsed;
}
{% endhighlight %}

<p>Since <code>parseFromTokenSourceLeaky</code> is public, we can call it directly and provide our own <code>std.heap.ArenaAllocator</code>. It isn't that different than using <code>parseFromTokenSource</code>, but some cases might already have a suitable ArenaAlloator, or the suitable lifetime hooks in place. In such cases, using the <code>Leaky</code> variant makes sense.</p>

<p>While it might seem superficial, I really dislike exposing <code>std.json.Parsed(T)</code>. The fact that <code>T</code> came from JSON is an irrelevant implementation detail. Also, there's nothing JSON or parsing-specific about <code>std.json.Parsed</code>. It's just an <code>ArenaAllocator</code> and your value. So, while it doesn't really change anything, given no other option, I like to create my own little <code>Parsed(T)</code>-like  wrapper:</p>

{% highlight zig %}
pub fn Managed(comptime T: type) type {
	return struct {
		value: T,
		arena: *std.heap.ArenaAllocator,

		const Self = @This();

		pub fn fromJson(parsed: std.json.Parsed(T)) Self {
			return  .{
				.arena = parsed.arena,
				.value = parsed.value,
			};
		}

		pub fn deinit(self: Self) void {
			const arena = self.arena;
			const allocator = arena.child_allocator;
			arena.deinit();
			allocator.destroy(arena);
		}
	};
}
{% endhighlight %}

<p>Which means that I can pass an <code>myApp.Managed(Config)</code> around rather than an <code>std.json.Parse(T)</code>.</p>

<p>Our original example, changed as a function that returns a <s><code>Config</code></s> <code>Managed(Config)</code> from a function, looks like:</p>

{% highlight zig %}
fn parseConfig(file_path: []const u8, allocator: Allocator) !Managed(Config) {
	const file = try std.fs.openFileAbsolute(file_path, .{});
	defer file.close();

	var buffered = std.io.bufferedReader(file.reader());
	var reader = std.json.reader(allocator, buffered.reader());
	defer reader.deinit();

	const parsed = try std.json.parseFromTokenSource(Config, allocator, &reader, .{
		.allocate = .alloc_always,
	});
	return Managed(Config).fromJson(parsed);
}
{% endhighlight %}
